Quantifying Semantics using Complex Network Analysis

نویسندگان

  • Christian Biemann
  • Stefanie Roos
  • Karsten Weihe
چکیده

Though it is generally accepted that language models do not capture all aspects of real language, no adequate measures to quantify their shortcomings have been proposed until now. We will use n-gram models as workhorses to demonstrate that the differences between natural and generated language are indeed quantifiable. More specifically, for two algorithmic approaches, we demonstrate that each of them can be used to distinguish real text from generated text accurately and to quantify the difference. Therefore, we obtain a coherent indication how far a language model is from naturalness. Both methods are based on the analysis of co-occurrence networks: a specific graph cluster measure, the transitivity, and a specific kind of motif analysis, where the frequencies of selected motifs are compared. In our study, artificial texts are generated by n-gram models, for n = 2, 3, 4. We found that, the larger n is chosen, the narrower the distance between generated and natural text is. However, even for n = 4, the distance is still large enough to allow an accurate distinction. The motif approach even allows a deeper insight into those semantic properties of natural language that evidently cause these differences: polysemy and synonymy. To complete the picture, we show that another motif-based approach by Milo et al. (2004) does not allow such a distinction. Using our method, it becomes possible for the first time to measure generative language models deficiencies with regard to semantics of natural language.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reverse Engineering of Network Software Binary Codes for Identification of Syntax and Semantics of Protocol Messages

Reverse engineering of network applications especially from the security point of view is of high importance and interest. Many network applications use proprietary protocols which specifications are not publicly available. Reverse engineering of such applications could provide us with vital information to understand their embedded unknown protocols. This could facilitate many tasks including d...

متن کامل

Translation and Hybridity in Scenes and Frames Semantics

 The present study is a theoretical attempt to illustrate how Fillmore's Scenes and Frames Semantics (SFS) could be employed as a framework to portray the process of understanding and translating hybrid texts. It first reviews the origin of SFS; then it maps SFS onto Nida’s linguistic model of translation process and the Interpretive Theory of Translation; it examines in the next section, withi...

متن کامل

Performance Appraisal of Research and Development Projects Value-Chain for Complex Products and Systems: The Fuzzy Three-Stage DEA Approach

The purpose of the current research is to provide a performance appraisal system capable of considering the value chain network structure of research and development (R&D) projects for Complex products and systems (CoPS) under uncertainty of data. Therefore, in order to achieve this goal, a network data envelopment analysis (NDEA) approach and the possibilistic programming to provide a new fuzz...

متن کامل

Bank’s Corporate Governance: Quantifying the Effects in Iranian Banking Networks

The most important tool for promoting the bank’s stability and health is the establishment of a standard corporate governance structure for managing the bank's business. Redesigning the relationships between bank management, shareholders and the rest of the bank’s stockholder, including the objectives, the risk and audit indices, and internal control of the bank, is recognized as the foundation...

متن کامل

I-13: Transcriptome Dynamics of Human and Mouse Preimplantation Embryos Revealed by Single Cell RNA-Sequencing

Background: Mammalian preimplantation development is a complex process involving dramatic changes in the transcriptional architecture. However, it is still unclear about the crucial transcriptional network and key hub genes that regulate the proceeding of preimplantation embryos. Materials and Methods: Through single-cell RNAsequencing (RNA-seq) of both human and mouse preimplantation embryos, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012